A taxonomy for similarity metrics between Markov decision processes
نویسندگان
چکیده
Abstract Although the notion of task similarity is potentially interesting in a wide range areas such as curriculum learning or automated planning, it has mostly been tied to transfer learning. Transfer based on idea reusing knowledge acquired set source tasks new process target task, assuming that and are close enough . In recent years, succeeded making reinforcement (RL) algorithms more efficient (e.g., by reducing number samples needed achieve (near-)optimal performance). RL core concept : whenever similar , transferred can be reused solve significantly improve performance. Therefore, selection good metrics measure these similarities critical aspect when building algorithms, especially this from simulation real world. literature, there many between MDPs, hence, definitions its complement distance have considered. paper, we propose categorization analyze proposed so far, taking into account categorization. We also follow taxonomy survey existing well suggesting future directions for construction metrics.
منابع مشابه
Bisimulation Metrics for Continuous Markov Decision Processes
In recent years, various metrics have been developed for measuring the behavioural similarity of states in probabilistic transition systems [Desharnais et al., Proceedings of CONCUR, (1999), pp. 258-273, van Breugel and Worrell, Proceedings of ICALP, (2001), pp. 421-432]. In the context of finite Markov decision processes, we have built on these metrics to provide a robust quantitative analogue...
متن کاملMetrics for Finite Markov Decision Processes
Markov decision processes (MDPs) offer a popular mathematical tool for planning and learning in the presence of uncertainty (Boutilier, Dean, & Hanks 1999). MDPs are a standard formalism for describing multi-stage decision making in probabilistic environments. The objective of the decision making is to maximize a cumulative measure of longterm performance, called the return. Dynamic programming...
متن کاملComputing Game Metrics on Markov Decision Processes
In this paper we study the complexity of computing the game bisimulation metric defined by de Alfaro et al. on Markov Decision Processes. It is proved by de Alfaro et al. that the undiscounted version of the metric is characterized by a quantitative game μ-calculus defined by de Alfaro and Majumdar, which can express reachability and ω-regular specifications. And by Chatterjee et al. that the d...
متن کاملAccelerated decomposition techniques for large discounted Markov decision processes
Many hierarchical techniques to solve large Markov decision processes (MDPs) are based on the partition of the state space into strongly connected components (SCCs) that can be classified into some levels. In each level, smaller problems named restricted MDPs are solved, and then these partial solutions are combined to obtain the global solution. In this paper, we first propose a novel algorith...
متن کاملMetrics for Markov Decision Processes with Infinite State Spaces
We present metrics for measuring state similarity in Markov decision processes (MDPs) with infinitely many states, including MDPs with continuous state spaces. Such metrics provide a stable quantitative analogue of the notion of bisimulation for MDPs, and are suitable for use in MDP approximation. We show that the optimal value function associated with a discounted infinite horizon planning tas...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Machine Learning
سال: 2022
ISSN: ['0885-6125', '1573-0565']
DOI: https://doi.org/10.1007/s10994-022-06242-4